Skip to content

Conversation

@cmp0xff
Copy link
Contributor

@cmp0xff cmp0xff commented Oct 30, 2025

Comment on lines -19 to +22
) -> tuple[np.ndarray, np.ndarray]: ...
) -> tuple[npt.NDArray[Any], npt.NDArray[Any]]: ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a chance to make this more precise, like

tuple[np_1d_array[np.intp], np_1d_array[np.intp]]

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no insight into this class / this class method, nor is it documented in any detail (personally I don't even know what "window bounds" means). It would be great if you suggest a few tests, or can we leave it in another PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure happy to leave it til later!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the change suggested by @MarcoGorelli . See the docs here: https://pandas.pydata.org/docs/dev/user_guide/window.html#custom-window-rolling

The arrays have to be np_ndarray_int64


class CategoricalIndex(ExtensionIndex[S1], accessor.PandasDelegate):
codes: np.ndarray = ...
codes: npt.NDArray[Any] = ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np_1d_array[np.intp]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@cmp0xff cmp0xff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/pandas_nightly

Comment on lines -19 to +22
) -> tuple[np.ndarray, np.ndarray]: ...
) -> tuple[npt.NDArray[Any], npt.NDArray[Any]]: ...
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no insight into this class / this class method, nor is it documented in any detail (personally I don't even know what "window bounds" means). It would be great if you suggest a few tests, or can we leave it in another PR?


class CategoricalIndex(ExtensionIndex[S1], accessor.PandasDelegate):
codes: np.ndarray = ...
codes: npt.NDArray[Any] = ...
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@property
def is_monotonic_increasing(self) -> bool: ...
def clear_mapping(self) -> None: ...
class IntervalTree(IntervalMixin): ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the above comment is right, this can probably just be removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

class IntervalTree is something implemented in pandas with Cython. I don't understand what's happening with Ctyhon, but would like to leave its presence in the stubs, even though it's empty. Please tell me if you have a good argument to remove it completely.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a documented class, so we should remove it from the stubs. And then change the references in _MidDescriptor and _LengthDescriptor to be IntervalMixin

| np_ndarray_float
| np_ndarray_complex
| dict[str, np.ndarray]
| dict[str, np_1darray[Any]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the general pattern you seem to be following is:

  • for function arguments, don't be restrictive on shape
  • for return types, be as precise as possible

Given that NumListLike is often used as argument, would this one be better as Any shape?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for function arguments, don't be restrictive on shape

This is relevant until the end of python 3.10 because it can only use an old version of numpy, which does not give the correct shape upon construction, i.e. np.array([1, 2, 3]) does not give a 1-d array with that old version, in static typing.

would this one be better as Any shape?

Yep dc85974

Copy link
Contributor Author

@cmp0xff cmp0xff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/pandas_nightly

@property
def is_monotonic_increasing(self) -> bool: ...
def clear_mapping(self) -> None: ...
class IntervalTree(IntervalMixin): ...
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

class IntervalTree is something implemented in pandas with Cython. I don't understand what's happening with Ctyhon, but would like to leave its presence in the stubs, even though it's empty. Please tell me if you have a good argument to remove it completely.

| np_ndarray_float
| np_ndarray_complex
| dict[str, np.ndarray]
| dict[str, np_1darray[Any]]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for function arguments, don't be restrictive on shape

This is relevant until the end of python 3.10 because it can only use an old version of numpy, which does not give the correct shape upon construction, i.e. np.array([1, 2, 3]) does not give a 1-d array with that old version, in static typing.

would this one be better as Any shape?

Yep dc85974

@cmp0xff cmp0xff added the Compat pandas objects compatability with Numpy or Python functions label Nov 4, 2025
Copy link
Collaborator

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few main issues:

  1. I think we should be consistent in the stubs in using the np_ndarray_xxx types that are now in _typing.pyi , rather than npt.NDArray[np.integer], etc. I think it's better to use those types that we have in _typing.pyi so we can be consistent througout the stubs.
  2. pd.unique(someIndex) and Index.unique() are a bit of a mess, because the behavior changes from 2.3 to 3.0. See the release notes: https://pandas.pydata.org/docs/dev/whatsnew/v3.0.0.html#other . So what we need to do here is return the numpy types as is done in 2.3, and put in a comment to switch to the Index types when 3.0 is released.
  3. I don't think we should use npt.NDArray[np.double] in the stubs. We always use np.floating
  4. There are places where we can be more precise on what are the allowed numpy array types as arguments.

I didn't necessarily pick up everything related to the main issues, and some of my comments are related to the 3 issues above.

@property
def is_monotonic_increasing(self) -> bool: ...
def clear_mapping(self) -> None: ...
class IntervalTree(IntervalMixin): ...
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a documented class, so we should remove it from the stubs. And then change the references in _MidDescriptor and _LengthDescriptor to be IntervalMixin

def __iter__(self): ...
@property
def asi8(self) -> np.ndarray: ...
def asi8(self) -> np_1darray[Any]: ...
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Result of this will be np_ndarray_int64

ignore_na: bool = ...,
axis: Axis = ...,
times: str | np.ndarray | Series | np.timedelta64 | None = ...,
times: str | npt.NDArray[Any] | Series | np.timedelta64 | None = ...,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
times: str | npt.NDArray[Any] | Series | np.timedelta64 | None = ...,
times: str | np_ndarray_dt | Series | np.timedelta64 | None = ...,

has to be datetimes

Comment on lines +78 to +79
@cache_readonly
def groups(self) -> PrettyDict[Hashable, Index]: ...
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why you added this. I think we can delete the class Grouping, because it is only used by the class BaseGrouper, which is only used by the class BinGrouper, which is never referenced.

None of those are documented.

Comment on lines -19 to +22
) -> tuple[np.ndarray, np.ndarray]: ...
) -> tuple[npt.NDArray[Any], npt.NDArray[Any]]: ...
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the change suggested by @MarcoGorelli . See the docs here: https://pandas.pydata.org/docs/dev/user_guide/window.html#custom-window-rolling

The arrays have to be np_ndarray_int64

ignore_na: _bool = False,
axis: Axis = 0,
times: np.ndarray | Series | None = None,
times: npt.NDArray[Any] | Series | None = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
times: npt.NDArray[Any] | Series | None = None,
times: np_ndarray_dt | Series | None = None,

)
check(
assert_type(pd.unique(pd.RangeIndex(0, 10)), np.ndarray),
assert_type(pd.unique(pd.RangeIndex(0, 10)), np_1darray[Any] | pd.Index),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't think this can be a numpy array

assert_type(
pd.unique(pd.timedelta_range(start="1 day", periods=4)),
np.ndarray,
np_1darray[Any] | pd.Index,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UGH - looks like the type depends on the version of pandas.

So what we should do is in the stubs, use the numpy array as the return type. Add a comment that it will switch to the Index type with 3.0, and we'll make the change then

I think this is true for many of the pd.unique() overloads.

Comment on lines +987 to +989
assert_type(pd.unique(dti), np_1darray[np.datetime64] | pd.DatetimeIndex),
np_1darray if PD_LTE_23 else pd.DatetimeIndex,
np.datetime64,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same issue here with the change from 2.3 to 3.0. So stubs should return the 2.3 types, and we'll change it when 3.0 is released.

check(assert_type(d1, npt.NDArray), np.ndarray)
check(assert_type(e0, npt.NDArray[np.intp]), np.ndarray)
check(assert_type(e1, npt.NDArray), np.ndarray)
check(assert_type(d1, np_1darray[np.double]), np_1darray[np.double])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't think we should use np_1darray[np.double] in our stubs. Just use np_ndarray_float

same in tests below

@cmp0xff cmp0xff marked this pull request as draft November 4, 2025 22:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Compat pandas objects compatability with Numpy or Python functions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants